Gender equality is an ongoing issue in the United States, particularly when it comes to employment and income. Despite progress in recent years, there is still a significant gap between men and women in terms of pay and job opportunities. Understanding the factors that contribute to this inequality is crucial for creating positive change. This project brings together data from various sources to help us better understand the geospatial condition of gender inequality, how it has evolved over time, and case studies in two industries.

1 Population and Employment by States

The gender-based unemployment data was obtained from the US Bureau of Labour Statistics in the following link https://www.bls.gov/lau/ex14tables.htm We are solely focusing on unemployment data from 2022 for each US state.

## Reading layer `gz_2010_us_outline_500k' from data source 
##   `/Users/lestary/Documents/spring23/dataviz/DataVizFinal/gz_2010_us_outline_500k.json' 
##   using driver `GeoJSON'
## Simple feature collection with 615 features and 3 fields
## Geometry type: LINESTRING
## Dimension:     XY
## Bounding box:  xmin: -179.1473 ymin: 17.88481 xmax: 179.7785 ymax: 71.35256
## Geodetic CRS:  WGS 84
## Reading layer `gz_2010_us_040_00_500k' from data source 
##   `/Users/lestary/Documents/spring23/dataviz/DataVizFinal/gz_2010_us_040_00_500k.json' 
##   using driver `GeoJSON'
## Simple feature collection with 52 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -179.1473 ymin: 17.88481 xmax: 179.7785 ymax: 71.35256
## Geodetic CRS:  WGS 84

This interactive map depicts the unemployment rates of each US state, with separate color coding for total unemployment and unemployment by gender. Darker shades indicate higher unemployment rates, while lighter shades correspond to lower unemployment rates. By analyzing the map, it becomes apparent that many states in the West Coast, including California, Oregon, Washington State, and Nevada, have higher unemployment rates compared to the Midwest. Additionally, some states in the Northeast, such as New York, Connecticut, and Pennsylvania, also display higher unemployment rates. The map effectively highlights geographic patterns of unemployment across the US and provides insights into disparities among different regions and demographic groups.

## Reading layer `gz_2010_us_outline_500k' from data source 
##   `/Users/lestary/Documents/spring23/dataviz/DataVizFinal/gz_2010_us_outline_500k.json' 
##   using driver `GeoJSON'
## Simple feature collection with 615 features and 3 fields
## Geometry type: LINESTRING
## Dimension:     XY
## Bounding box:  xmin: -179.1473 ymin: 17.88481 xmax: 179.7785 ymax: 71.35256
## Geodetic CRS:  WGS 84
## Reading layer `gz_2010_us_040_00_500k' from data source 
##   `/Users/lestary/Documents/spring23/dataviz/DataVizFinal/gz_2010_us_040_00_500k.json' 
##   using driver `GeoJSON'
## Simple feature collection with 52 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -179.1473 ymin: 17.88481 xmax: 179.7785 ymax: 71.35256
## Geodetic CRS:  WGS 84
# Create a basic leaflet map
map <- leaflet() %>%
  setView(-98.5795, 39.8283, zoom = 4) %>%
  addProviderTiles(providers$Stamen.TonerLite) # Base groups = background layer

# Add all layers to the map
map %>%
  addPolygons(
    data = us_states_merged,
    group = "Total Unemployment",
    fillColor = ~total_unemployment_palette(Total),
    fillOpacity = 0.8,
    color = "#000000",
    weight = 1,
    popup = ~paste(NAME, "<br>Total Unemployment:", Total, "%",
                   "<br>Male: ", Men, "%",
                   "<br>Female: ", Women, "%")
  ) %>%
  addPolygons(
    data = us_states_merged,
    group = "Male Unemployment",
    fillColor = ~male_unemployment_palette(Men),
    fillOpacity = 0.8,
    color = "#000000",
    weight = 1,
    popup = ~paste(NAME, "<br>Male Unemployment:", Men, "%")
  ) %>%
  addPolygons(
    data = us_states_merged,
    group = "Female Unemployment",
    fillColor = ~female_unemployment_palette(Women),
    fillOpacity = 0.8,
    color = "#000000",
    weight = 1,
    popup = ~paste(NAME, "<br>Female Unemployment:", Women, "%")
  ) %>%
  # Add layer controls to switch between layers
  addLayersControl(
    overlayGroups = c("Total Unemployment", "Male Unemployment", "Female Unemployment"),
    options = layersControlOptions(collapsed = FALSE)
  )

Based on the map, we can visually interpret that for both genders, there is relatively higher unemployment in the West Coast and some states in the Northeast, compared to the Midwest. For instance, for male unemployment, there is high unemployment in states like Nevada, Washington and California. Interestingly, similar patterns can be seen for female unemployment. Perhaps there aren’t pronounced gender differences in unemployment in many states in the US in 2022.

3 Breaking the Glass Ceiling: Top 10 Occupations for Women in the United States

While it is true that the top occupations for women in the United States are primarily in the services industry, it is important to note that a significant portion of these jobs are low-skilled or involve repetitive tasks. While these jobs are necessary for our economy and provide important services to communities, they often do not offer the same level of job security or opportunities for career growth as higher-skilled occupations. It is important to work towards creating a more equal job market where women have access to a wider range of employment opportunities, including those that offer more skill development and higher pay.

# Create the vertical bar chart
ggplot(occupations, aes(x = reorder(Occupation, -Worker), y = Worker/1000)) +
  geom_col(fill = "#F06292") +
  labs(title = "Top 10 Women Occupations", x = "Occupations", y = "Number of Workers (in 1000)") +
  theme_economist() +
  coord_flip() +
   theme(plot.title = element_text(size = 14, hjust = 0.5),
        axis.title = element_text(size = 11),
        legend.title = element_text(size = 11),
        legend.text = element_text(size = 11),
        axis.title.x = element_text(margin = margin(t = 1, r = 10, b = 1, l = 0)),
        axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)))

4 Case Study 1: Lazega Law Firm in United States

fig <- plot_ly(x = Advice_interactions$practice, y = Advice_interactions$Interactions, color = factor(Advice_interactions$gender) , type = "box")
fig <- layout(fig, xaxis = list(
  ticktext = c("Litigation", "Corporate"),
  title = "No. of interactions by practice, split by gender (1=man,2=woman)"
))

fig
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

A box plot of the number of interactions each of node in the network. This is split on two levels – First, it is split by practice (1 = Litigation and 2 = Corporate). This is then further split by men and women (1=man, 2 =woman).

## $`1`
## [1] "X1"
## 
## $`2`
##  [1] "X2"  "X8"  "X11" "X13" "X21" "X24" "X26" "X27" "X29" "X34" "X36" "X38"
## [13] "X39" "X41" "X43" "X52" "X54" "X55" "X56" "X57" "X58" "X59" "X60" "X61"
## [25] "X62" "X63" "X64" "X65" "X66" "X67" "X70" "X71"
## 
## $`3`
## [1] "X3"
## 
## $`4`
## [1] "X4"
## 
## $`5`
## [1] "X5"
## 
## $`6`
## [1] "X6"
plot(gn, dmg)

##  gn.membership  
##  Min.   : 1.00  
##  1st Qu.: 2.00  
##  Median : 5.00  
##  Mean   :11.38  
##  3rd Qu.:20.00  
##  Max.   :36.00
## $`1`
##  [1] "X39" "X41" "X42" "X52" "X54" "X56" "X57" "X59" "X61" "X66" "X67" "X70"
## 
## $`2`
##  [1] "X1"  "X2"  "X4"  "X7"  "X8"  "X10" "X12" "X14" "X15" "X16" "X17" "X19"
## [13] "X20" "X22" "X23" "X28" "X30" "X33" "X35" "X37" "X44"
## 
## $`3`
##  [1] "X3"  "X5"  "X6"  "X18" "X25" "X32" "X46" "X50" "X51" "X53" "X69"
## 
## $`4`
## [1] "X64" "X71"
## 
## $`5`
## [1] "X31" "X36" "X48" "X49" "X58" "X68"
## 
## $`6`
##  [1] "X9"  "X11" "X13" "X21" "X24" "X26" "X27" "X29" "X34" "X38" "X40" "X43"
## [13] "X45" "X47"
plot(walk, dmg)

##   name gn.membership walk.membership
## 1   X1             1               2
## 2   X2             2               2
## 3   X3             3               3
## 4   X4             4               2
## 5   X5             5               3
## 6   X6             6               3
# Second part has two plots created: a) colouring by membership and b) colouring by gender
plot(walk, dmg, col=factor(walk$membership))

The above social network graph has grouped the Lazega nodes using the Random Walk algorithm method to classify groups. Thereafter, the nodes are coloured by gender. As we can see from the graph, the number of women are materially fewer than the men.

5 Case Study 2: Differences in gender representation in Hollywood

We conducted text analysis of plot descriptions of movies to compare the use of words between American Hollywood films that have good and bad female representation. We used the Bechdel test as a measure of female representation, which looks at whether a movie has at least two named female characters who talk to each other about something other than a man. By analyzing the plot descriptions, we aimed to understand how movies with good and bad female representation differ in the words they use. Even though the Bechdel test isn’t the sole indicator of female representation in films, this analysis can provide insights into how women are portrayed in movies and how their representation can impact the way we perceive and value women in society.

The datasets were obtained from a study conducted by FiveThirtyEight on Hollywood’s gender bias. The datasets can be downloaded from the following Github repository https://github.com/rfordatascience/tidytuesday/tree/master/data/2021/2021-03-09

5.1 Word Cloud

We chose 500 movies that pass the Bechdel Test and compare with 500 movies that failed the Bechdel test. We will analyse the text from “plot”. We use the cleaning functions to remove unnecessary words (stop words), syntax, punctuation, numbers, white space, etc. We also creates a document-term-matrix, and provided word clouds of the most frequent words among the movies that pass and fail the Bechdel test.

5.1.1 In movies with better female representation (passed the Bechdel test) have more common words in the plot like woman, school, life, girl, family (which are family-oriented and domestic).

5.1.2 While movies with less female representation (failed the Bechdel test) have plots with common words like life, world, story, team etc. which implies more adventure-driven movies

# Create a word cloud with the custom blue color palette
wordcloud(words = clean_text_fail$word, freq = clean_text_fail$n, scale = c(3, 0.5),
          random.order = FALSE, colors = blue_palette(length(clean_text_fail$word)))

Based on the word clouds, we found that the most common words in the plot of movies that passed the Bechdel test are woman, school, life, girl, family, home, world, love, classic and daughter. Common words in the plot of movies that failed the Bechdel test are life, world, story, save, friends, team, and death.

In the next stage of our analysis, we presented bar graphs displaying the frequency of the top 20 most commonly used words, along with their respective counts, to provide further details about the distribution of these words. Our analysis includes three visual representations: bar graphs for both passed and failed Bechdel Test categories, as well as a pyramid plot that compares the most common words used in movies that passed and failed the Bechdel Test. By presenting these visualizations side by side, we can easily compare and contrast the frequency and type of words used in the plot descriptions of movies that passed and failed the test.

Bar graph for movie plots that passed the Bechdel Test

# Create a bar graph of the most common words
ggplot(clean_text, aes(x = reorder(word, -n), y = n)) +
  geom_col(fill = "palevioletred") +
  labs(x = "Word", y = "Frequency", title = "Most Common Words in the Plot of Movies \nthat Passed the Bechdel Test") +
  theme_economist() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 10),
        axis.text.y = element_text(size = 10),
        axis.title = element_text(size = 12),
        plot.title = element_text(size = 14, hjust = 0.5),
        plot.margin = unit(c(1, 1, 1, 1), "cm"),
        panel.background = element_rect(fill = "mistyrose"),
        plot.background = element_rect(fill = "mistyrose"))

Bar graph for movie plots that failed the Bechdel Test

# Create a bar graph of the most common words
ggplot(clean_text, aes(x = reorder(word, -n), y = n)) +
  geom_col(fill = "steelblue") +
  labs(x = "Word", y = "Frequency", title = "Most Common Words in the Plot of Movies \nthat Failed the Bechdel Test") +
  theme_economist() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 10),
        axis.text.y = element_text(size = 10),
        axis.title = element_text(size = 12),
        plot.title = element_text(size = 14, hjust = 0.5),
        plot.margin = unit(c(1, 1, 1, 1), "cm"))

5.1.3 Pyramid Plot

We provide a pyramid plot to show how the words between passing and failed Bechdel test movies differ in frequency. A selection of 20 top words are chosen.

# Create the pyramid plot
par(mar=c(5,5,2,2))
pyramid.plot(clean_text_pass$n, clean_text_fail$n,
             labels=clean_text$word, 
             main="Most Common Words from Movie Plot Descriptions",
             lxcol= "palevioletred", rxcol= "steelblue", gap=20,
             top.labels = c("Passed Bechdel Test", " ", "Failed Bechdel Test"),
             xlim=c(0,50),
             laxlab = seq(from = 0, to = 50, by = 10),
             raxlab = seq(from = 0, to = 50, by = 10),
             mtext(" ", side = 1, line = 5, col = "black", cex = 1.2))

## 50 50
## [1] 5 5 2 2

6 Conclusion

Based on our analysis of movie plots, we compared the frequency of words used in movies that passed and failed the Bechdel Test as a measure of female representation. The Bechdel Test is a standard to evaluate female representation in movies that requires a movie to have at least two named female characters who talk to each other about something other than a man. The Bechdel Test has become a widely recognized standard for evaluating the representation of female characters in movies, providing a simple yet effective metric to evaluate whether women are depicted as fully formed characters who have conversations about topics other than men. However, it is important to note that a movie that fails the Bechdel Test should not be automatically labeled as anti-feminist or problematic. Instead, the test serves as a tool for critical analysis and a starting point for further examination of gender representation in media. By using the Bechdel Test as a measure of female representation in movies, we can better understand the patterns and biases that may exist in media and work towards creating more equitable and diverse representation of women in film.

We found that movies with better female representation (passed Bechdel Test) had more common words in the plot like “woman”, “school”, “life”, “girl”, “family”, which implies themes of domesticity. In contrast, movies with less female representation (failed Bechdel Test) had plots with common words like “life”, “world”, “story”, and “team”, which implies more adventure-driven plots. These findings suggest that there are differences in the types of movies that pass or fail the Bechdel Test, which may be related to the representation of female characters in the movies. The themes that we observed in the movies that passed or failed the Bechdel Test reflect certain gender norms that have been perpetuated in media. The use of words like “woman”, “school”, “life”, “girl”, and “family” in the plots of movies that passed the Bechdel Test suggests that these movies may be more oriented towards domestic themes that are traditionally associated with women. On the other hand, movies that failed the Bechdel Test tended to have more adventure-driven plots with words like “life”, “world”, “story”, and “team”, which aligns with traditional masculine gender norms.

This type of analysis is important because it helps to shed light on potential differences in how women are represented in media, and how this representation might differ based on whether a movie passes or fails the Bechdel Test. By understanding these differences, we can start to identify potential biases and patterns that may exist in media and work towards creating more equitable representation of women in movies. Additionally, this analysis can contribute to a broader conversation around gender and media representation, and help us to better understand the ways in which media shapes our perceptions and expectations of gender roles.